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URN SCHEMATA AS A BASIS FOR THE DEVELOPMENT OF 
CORRELATION THEORY.* 

By H. L. Rietz 

It is well known that simplicity and precision are gained by the use 
of urn schemata in establishing various theorems in the theory of prob- 
ability. The fundamental importance of urn schemata in mathematical 
statistics is brought out well by Borelf in the statement that " the gen- 
eral problem of mathematical statistics is to determine a system of 
drawings carried out with urns of fixed composition, in such a way that 
the results of a series of drawings, lead, with a very high degree of prob- 
ability, to a table of values identical to the table of observed values." 

The expression " urn schemata " is used in the present paper to 
mean any games of pure chance such as those arranged with balls to be 
drawn from a bag or urn, or with coins and dice to be thrown. 

The urn schema back of the fundamental theorem of BernoulU, and 
the urn schema of Poisson's extension of the Bernoulli theorem are so 
useful in avoiding complicated verbiage that the method of the urn 
schema is the standard plan of approach to these important theorems. 

The theory of correlation that has been much applied in recent years 
to statistical data has been developed largely as an extension of error 
theory 4 It has long seemed to me that it would be important to invent 
some games of chance that would give a meaning to the correlation 
coefficient in pure chance, and that would perhaps furnish a basis from 
which to proceed to develop the theory of correlation. Experiments! 
have been performed with dice to show something of the meaning of the 
correlation coefficient; but the methods were purely empirical and con- 
sisted simply in recording the results of a certain number of trials instead 
of approaching the problem from the standpoint of theoretical prob- 
abilities. 

It is the main purpose of the present paper to present the results of 

* Read before the American Mathematical Society, Sept. 4, 1919. 

t filaments de la Th^orie des Probability, p. 167; La Hazard, p. 154. 

i Bravais, Analyse Mathematique svir les ProbabUitfe des Erreurs de Situation d'un Point. 
Memoirs par divers Savants, 1846. 

§ Weldon, Lectures of the Method of Science, Edited by T. B. Strong, Oxford, 1906, pp. 
81-100. 

Darbishire, Some Tables for Illustrating Statistical Correlation, Memoirs and Proceedings 
of the Manchester Literary and Philosophical Society, Vol. 51, No. 16, 1907. 

306 



URN SCHEMATA. 



307 



devising certain urn schemata which may serve as a starting point of 
the theory of correlation since a vivid picture is given in this way of the 
meaning of the coefficient of correlation as related to certain a priori 
probabilities. 

Case I. Pairs of drawings with balls in common taken at random from 
the first drawing of a pair. An urn containing white and black balls is so 
maintained that in drawing a ball the probability of getting a white ball is a 
constant p and that of getting a black ball is q = 1 — p. The first drawing 
of a pair is to consist of s balls taken one at a time from the urn. The second 
drawing is to consist of s balls of which t are taken at random from the s first 
drawn, and s — t are drawn one at a time from the urn. Then the regression 
is linear, and the coefficient of correlation between the number of white balls 
in first and second drawings of a pair is i/s, when the frequencies are a set 
of a priori most probable frequencies. 

To illustrate by means of a simple special case, we exhibit first (Fig. 1) 
a correlation table for s = 5, i = 3, p = j. 

Showing Most Probable Feequencies fob Pairs op Drawings to Illttstrate 
Case I for s = 5, < = 3, p = i 







Number of White Balls in First Drawings of Pairs 


Totals 







1 


2 


3 


4 j 5 




5 








9 


6 I 1 


16 


4 






81 


108 


45 1 6 


240 


Whi 
Dra 

airs 


3 




243 


648 


432 


108 1 9 


1,440 




2 


243 


1,620 


1,728 


648 1 81 


1 4,320 




1 


1,458 


3,159 


1,620 


243 ! 




6,480 





2,187 


1,458 


243 


1 

i 




3,888 


Totals 


3,888 


6,480 


4,320 


1,440 i 240 


16 


16,384 



Fig. 1. 



The table (Fig. 1) exhibits the a priori most probable frequencies 
when we use as small numbers as possible for frequencies subject to the 
condition that each frequency is to be an integer. The respective fre- 
quencies of 0, 1, 2, 3, 4, 5 white balls in first drawings of pairs are clearly 
proportional to the terms of the binomial expansion, (f + \Y, and such 
frequencies are shown in the horizontal row of totals at the bottom of 
the table. 

The vertical arrays in the table exhibit frequencies of second drawings 
of pairs such that the totals of such frequencies satisfy the condition 
that they are proportional to the terms of the expansion (| + \y. 
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From the numbers in the table, we obtain for the correlation coef- 
ficient, by the usual method of calculation, the simple result 

r = 3/5. 

Proceeding now to the general case, let Nxy (Fig. 2) represent the 
a priori most probable frequency of x white balls in the first drawing 
and y white balls in the second drawing of a pair. That is, N^y is to be 
defined for each of the (s + 1)'' values obtained by assigning to both 
X and y the values 0, 1, 2, • • •, s. The general method of obtaining these 
a priori most probable frequencies in convenient form is to first derive 
in terms of s, t, p, q, x and y the probabilities of obtaining 3; (a; = 0, 1, 2, 
• • •, s) white balls followed hj y {y = 0, 1, 2, •••, s) white balls to make 
a pair, and then to multiply these probabihties by the smallest positive 
constant k that will give products each of which is an integer. 

3BKELATI0N OF NUMBEB OP WhITE BaLLS IN FlBST AND SECOND DRAWINGS OF PaIES OF DRAWINGS 

UNDER Conditions of Case I. Table Showing the a Priori Most Probable 
Frequencies of Kq~' Pairs op Drawings. 







White Balls in First Drawings of Pairs 









1 1 2 


... 


X -1 


X 


... 


$ 


Totals 


i 


« 














Kj^-'q-' 


Kp'q-' 


5 






i 












•• 




y 


K^tCyq'-'-^P' 


I 






N,y 






K,Cypyq-« 


i(^ 


y-\ 


- •• 


.... 












•■ 




■■ 


■• 


.... 












•• 


3.S 


2 


K.-TiCV'-^P' 


1 












•■ 


a 


1 


K,— Cig'-'-'p 


1 












K.Cipg-1 


s 





Kg'-' 


«(s - Op?'-'-' i 












K 


Totals 


K 


1C.C1P?-' i KsCip^g-^ 




... 


K.Cp'q-' 


... 


Kp'q-' 


Kq-' 



Fig. 2. 

In order to express Nxy in terms of s, t, x, y, p, and q, we shall explain 
first the construction of the correlation table outlined in Fig. 2. The 
first drawings of pairs are simply repeated trials with probability p of 
success at drawing a white ball and q of failure at doing so. The fre- 
quencies may therefore be taken proportional to the terms of the binomial 
expansion {q + p)'. The frequencies which we find it convenient to use 
are the terms of this expansion times Kq~'. 

Corresponding to any number of white balls in first drawings of pairs, 
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the table (Fig. 2) shows a vertical array for exhibiting the frequencies of 
various results of second drawings. Thus, corresponding to the drawing 
of no white balls in first drawings, we have for the a priori most probable 
frequencies of white balls in second drawings simply k times the various 
terms of the expansion {q + p)'~' as shown in the vertical coliunn under 
the mark for the number of white balls in first drawings. 

Consider next the vertical array marked 1. This is to include the a 
priori most probable values of second drawings that correspond to draw- 
ing 1 white ball and s — 1 black balls in first drawings. Two cases arise: 
All of the t balls taken at random from the fijst drawing may come from 
the s — 1 black balls or i — 1 may come from the s — 1 black balls and 1 
may be the white ball of the first drawing. The number of ways for the 
first and second of these events to occur is a constant times 

TTiCt and TriCjzii respectively. 

This array consists therefore of two subcolumns of frequencies that 
may be made up by multiplying the frequencies in the vertical array 
marked by two numbers proportional to izuCt and t^Cttt and whose 
sum is sCiplg. That is, 

KiiTTiCt + TTiCTri) = ,Ciplq. 

Since 7:riCt + TrrCTri = sC(, we have 



Kl = 



Hence, the multipUers are 



aCtq' 



--jT -TTiCt = -(s - t), and --tt -i^CTrr = — . {A) 

q^Vt q qs^t q 

It should be noted that in viewing the subcolumns from their lower 
ends upwards, the frequencies different from of the subcolumns begin 
at white balls for the case in which we use the multipUer (p/g) (s — t) 
and at 1 white ball for the case in which we use the multiplier ip/q)t. 

Consider next the vertical array marked 2. It consists of three* 
subcolumns corresponding to the following ways of drawing t balls from 
s — 2 black balls and 2 white balls: The number of ways in which the t 
balls can be drawn to include no white ball is i^Ct, to include one white 
ball is 2T=iC7^, and to include two white balls is ^^r^Ciiiz. The vertical 
array marked 2 consists therefore of three subcolumns of frequencies 
that are made up by multiplying the frequencies in the array marked by 

* One of the three columns would vanish if < < 2, and a different one if ^ > s — 2. It is 
to be understood throughout the paper that mC„ = if m < w. 
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numbers proportional to ttjC^, 2;z5C?3t, and iriCira and whose sum is 
{v'lf).Ci. That is, 

But since izsC^ + 27r:2C7ri + 7=207:12 = tCt, we have 

Hence, the multipliers are 

p3^ —r - Vl—r ot.''^ —r— - ^ff^ a 



P! ?Pi —n t 






Consider similarly the vertical array marked 3. It clearly contains 
four subcolumns corresponding to white, 1 white, 2 white, and 3 white 
balls among the t balls in common. Applying the same method as that 
in obtaining the array marked 2, we find that the multipliers by which 
to multiply frequencies in the vertical array marked to get the sub- 
columns of frequencies for the array marked 3 are 

/pZ pZ pS pZ 

-g^rjCs, -^tjziCi, -^tCi{s — t), ~^tCz. 

Similarly, it is easily shown that in the vertical array marked x there 
are a; + 1 subcolimms of frequencies given by multiplying the frequencies 
in the vertical array marked by 

— 7:rtCx, —tjz'tC'^zi, —tCiTTtC^zi, •••, — tCjrTirvCi, "ZitCx, (-B) 

for the cases of white, 1 white, 2 white, • ■ •, x white balls respectively 
among the t balls. Some of the x + 1 subcolumns may vanish, but 
this condition is met by the fact that mCn = if m < n. Next, form a 
sum of products of the above-named mtiltipliers by those terms of the 
expansion ic(g -|- p)'~* that give the frequencies of exactly y white balls 
in second drawings. This gives the general term 

Nxv = K{7^tCx7:rtC^-'-''-'p'+'' -f f— ,CjrT— ,Cj=T3-*-''-'+ip^''-^ 

+ tC2 7=iC—27:r,C^^'-'-y-*+'p-+''-' + ■■■ (O 

+ tC, 737C^ — ,C5=;g'-«'-*p«') . 

The sum of frequencies in the horizontal row marked y is given by 

Ny = N^y + Nxy-^N^y+ "■ +Nxy+ •■■ + N,y. (D) 
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By making x = 0, 1, 2, 3, • • •, s in the general expression (C) for Nxy, 
we obtain 

- N2y = —eCyq^-^-'-'py+'—^C^ + t—,Cjziq'-«-'-'p«+'—,Ci 

K 

+ tC^—tC^^'-y-^p", 

- Nzy = —tC^'-y-'-^py^^—,Ci + t—,C—,q'-y-'-^py-^—tC2 + ■••, 

K 



\ N—,y = —,Cy—,C—,q-''p''+'-' + t—,C^q-y+'py'-'-'-^—,C^zi=i + ■■■ 

\ iVjr^, = + t—,C—,q-ypy+'-'—,C— H , 



Adding by columns, and using the fact that 

g'-' + 7^Ciq'-'-'p + —,C2q'-*-Y + • • • + p'-' = 1, 
we have 

- iv„ = pyq-y{—tCy + t—tC—x + tCi —,c^2 + • • • + —tC—:) 

= pyq-^sCy by a well-known theorem of combinatorial analysis. 
Thus, the sum of frequencies in any horizontal row marked y is 

Ny = Kpyq-y,Cy. (E) 

Having expressed iVjj, in terms of s, t, x, y, p, and q, and having deter- 
mined the nature of the subcolumns that make up vertical arrays, we 
can express in terms of s and t the correlation coefficient 

^ ^ S(a; -x)(y- y) 
nffxo-y ' 

where x and y are respectively the numbers in first and second drawings 
of pairs. In this case, n = Kq-' as is seen from adding the totals of 
arrays. 

When we multiply the totals of arrays by q' we obtain the terms of 
the expansion of (q + p)' times the multiplier k. 
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Hence, it follows that a^ and Oy have the same values that they have 
for a BernoulU frequency distribution. That is, 

O-x = Vspg, <Jy = Vspg. ((?) 

By a somewhat laborious process involving the use of certain theorems 
of combinatorial analysis, it results that 

S(x - :r)(2/ - #) = icfpg-'+S 
and hence that 

^ ^ S(x - x){y - y) ^ t ,^. 

naxdy s ' 

Since I have found a much simpler method of obtaining the value 
of r than that which involves a separate calculation of S(x — x){y — y), 
this simpler process will be presented here. It depends upon the proposi- 
tion that the means* of the set of vertical arrays (Fig. 2) lie on a straight 
line of slope t/s. 

In order to prove this proposition, we shall show that the difference 
between the mean Mx of any array marked x and the mean Mx-\ of an 
array marked a; — 1 is f/s. 

By considering the subcolunans defined above for an array marked x 
and making use of the fact that the mean number of successes in a case 
of s — i trials is p(s — i), when p is the probability of success in a single 
trial, we can give a formula for the mean Mx derived from the means of 
the subcolumns weighted with the frequencies in the subcolumns. Thus, 
for the vertical array marked x, we have 

,CxP'q-'Mx = —,CxP'q-Ms - t) + t—,C—^lpis - t) + l]p'q-' 

+ tC2 —,C—,[pis -t)+ 2]p^q-' + ■■■ 

+ tC—i(s - t) [p{s -t)-\-x- l]p%-^ 

+ iCx{p{s - t) -^x]p'q-'. 
Similarly, 

.C^p^-ig-^+iilfx-i = —tC—,pis - t)p^-'q-^+^ 

+ t—,C—2[pis -t) + l]p--ig--+i 

+ tC2 —,C—s[p{s -t)+ 2]p'-^q-^' + ••• 

+ tC—,—,Ci[p{s -t)+x- 2]p^-'q-'+' 

+ iC—Ap{s -t)+x- l]p-ig-+i. 

* The expression " mean of an array " is used very generally in statistical language as an 
abbreviation for the " mean of values whose frequencies are exidbited in an array." This is 
the sense in which " mean of array " is used throughout the present paper. 
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Then, after some simplification, we obtain 

P{S - t) [x,Cx - {S — X + l),Cx-l] 

Mx — M.x-\ = ^77 = T • yi) 

Thus, when we use the number of white balls in first drawings of 
pairs as abscissas and the mean values of the number of white balls in 
the corresponding second drawings of pairs as ordinates, these mean 
values lie on a straight line (the Une of regression) of slope i/s. 

But it is well known that*' 

y-y = r — {x-x) 
is the equation of the line of regression. Since from {G), <Jx = o-^, we have 

This simple result is very interesting for the reason that the correla- 
tion coeflStCient for this urn schema is thus shown to be simply the ratio 
of the number of balls t in common in the two drawings to the total 
number s in a drawing. 

Case n. Fairs of drawings with one ball of the most numerous color in 
first drawings in common. 

We shall consider now an urn schema in which the correlation coef- 
ficient does not turn out to be the ratio of the number of balls in common 
to the total number in a drawing, but in which there is special interest 
because this case gives us a very simple illustration of non-Unear regression 
from an urn schema, and because r is expressible in simple form in terms 
of familiar combinations. 

An urn containing an equal number of white and black baUs is so main- 
tained that in drawing a ball the probability is 5 of getting a white ball. In 
the first drawing of a pair, s balls are drawn one at a time from the urn giving 
t of one color and s — t of the other. If t ^ s — t, the second drawing of s 
balls is to consist of s — 1 taken one at a time from the urn, and one ball 
of the color showing the greater number in the first drawing of the pair. 
If t = s — t, the second drawing is to consist like the first of s balls taken 
one at a time from the urn. Then the regression is non-linear, and the corre- 
lation coefficient between the number of white balls in first and second drawings 
of pairs is r = aCs/2/2' when s is an even number, and r = 7=iC ^,-1)12/2' 
when s is an odd number, under the condition that the frequencies are a set 
of a priori most probable frequencies. 

* See Yule, Introduction to the Theory of Statistics, third edition, p. 171. 
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In other words, the correlation coefficient is the maximum term of the 
binomial expansion of (| + f )» if s is an even number, and the maximum 
term of the expansion of (| + |)'-^ when s is an odd number. 

In the table (Fig. 3) is shown f or s = 5 a set of a priori most probable 
frequencies with respect to the number of white balls in first and second 
throws under the conditions specified in Case II. Let coordinates (x, y) 







Number of White Balbin 
Tint Draw in as of Fairs 


lotab 







1 


2 


3 


4 


5 


1 
1 


5 








10 


5 


1 


2* 


4 


1 


5 


10 


40 


20 


4 


2*5 


3 


4 


20 


40 








2*10 


/8d 


30 


6 


2 








40 


20 


4 


2* 10 


G 


30 


60 


i 


4 


20 


+0 


10 


S 


1 


2*5 





1 


5 


10 








2* 


Totals 


2+ 


2*5 


2*10 


2*10 


2*5 


2* 


2» 



Fig. 3. 

COHEELATION TaBLE POE CaSE II WHEN S = 5. 



represent the number of white balls in first and second drawings of any 
pair. 

The broken line shown in Fig. 3 is the line of regression of second 
drawings of a pair on first drawings of a pair. Let <Tx and o-^ be the 



standard deviations of a;'s and y's respectively, 
that 

(Tx = jVs, 
and 



Then it follows at once 



from the fact that frequencies in vertical arrays are proportional to the 
coefficients of 1/2* in the terms of the expansion of {\ + J)». The total 
frequency is » = 2*'-^ 
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Then the problem is to express the correlation coefficient 



2(^-|)(^-|) 



r ^ ^^^ ^ {K) 

riffxffy 

in as simple a form as possible in terms of s, where the sum 2 is to extend 
to an entire table of values for s balls in a drawing similar to the table 
shown in Fig. 3 for s = 5. 

In a more convenient form, (K) may be written, when s is an odd 
number, as 

2 2.CxmC„f X - 2 j( 2/ "2 ) 

+ 12 12. Cx —iCy-i ( a; - 9 ) ( y - o ) 



r = 



nffx(Tj, 



2 2 S^CiTrrCyf X "2 j( J/ "2 ) 



-3 



s2^' 

Examine the numerator first for the case x — 0, y = 0, 1, 2, • • •, s — 1. 
This subtotal gives the sum 



;rTCo(o- 


■|)(o- 


-ih 


—Ci(^0- 


-00- 


-i)+ 


7ZlC2(^0- 


■l)(- 


-ih 




,-ia-i(o 


-i)G 


— 1) 



Adding, we get (o - |) [- s2'-^ + (s - l)2'-2] (^ - | ) 2-'. 

Similarly, add for x = 1, and we obtain ~ ^Cill —^j 2'-^, 

for X = 2, and we obtain — .Cz ( 2 - | j 2'-^, 

s — 1 / s — 1 s\ 

for X = — 2 — , and we obtain — »C(,_i)/2 { — ^ 5 ) 2'"*' 
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Collecting for x = 0, 1, • • •, (s — l)/2, we have 



_ (^s2-^ - "'-^^'-"^^ - s2-^)2-^ = s2-3;rTC(._i)/2. 



With this value substituted in (L), we obtain 

r = 
Similarly, for s an even number 



r = '-=^^~^. {M) 



r = -^ . (N) 

If r, be used for the correlation coefficient under the conditions of these 
drawings, we nxjte from (M) and (N) that 

Tit = Tit+i, where t is any positive integer. 

The results in Case II differ very much from that in Case I, but it is 
interesting that the correlation coefficient is siniply the maximum term 
of the binomial expansion (J + |)' or (| + i)'-^ acobrding as s is an 
even or an odd number. It is also interesting that even this simple 
Case II does not give linear regression. The means of vertical arrays 
lie on three straight lines. The means of vertical arrays for a; < s/2 
lie on a horizontal line. The means of vertical arrays for a; > s/2 lie 
on another horizontal line one unit from the first line. "When s is an odd 
number, there are two vertical arrays nearest the middle of the table 
having their means on a straight line of slope 1. When s is an even 
number, there are three vertical arrays nearest the middle of the table 
having their means on a straight line of slope J. 

Since this urn schema gives non-linear regression, it may be well to 
calculate the correlation ratio* ?; as a substitute for the correlation coef- 
ficient. For s an odd number, we obtain 



and for s an even number 



''=TV 



sCali 



s2' 



From a comparison of these results with the corresponding result 
from Case I, we may note that r = ?; = 1/s if there is one ball in common 
taken at random from a first drawing as in Case I, whereas when the ball 



* Pearson, On the Geneial Theory of Skew CorrelatioD and Non-Linear Regression, Drapers' 
Company Kesearcb Memoirs, 1905, pp. 1-54. 
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in common is required to be one of the more numerous color as in Case II, 
we get 17 = 1/ Vs when s is an odd number and an approximation to this 
when s is an even number. 

Case in. Consider next a case with a variable number of balls in com- 
mon. Throw s coins noting the number of heads m and the number of tails 
s — m. If m > s — m, leave a number of heads equal to the difference 
m — {s — m) = 2m — s lying to be counted in the second throw of the 
pair to be made with the remaining 2s — 2m coins. If s — m > m, leave 
the difference s — 2m tails lying to be counted in the next throw with the 
remaining coins. When m — s — m, throw all the s coins for the second 
throw. Then the regression is linear and the correlation coefficient between 
the numbers of heads that occur in pairs of throws is 

r = . -— if s is an odd number, 

yj2 — 2^i7-iC(s-.i)i2 



and 



2' 
r = . if s is an even number, 



•\/2 — 2;zi»-iC«/2-i 



when the frequencies are a set of a priori most probable frequencies. 

The following table (Fig. 4) shows for s = 7 a set of a priori most 
probable frequencies with respect to number of heads in first and second 
throws of pairs of throws of 7 coins in accord with the conditions of 
Case III. 

Similar to the usage in the previous cases, the number of heads in 
the first throw of a pair will be used as an abscissa and the number in the 
second throw as an ordinate. Then we have at once 

<Tx = ^-^ 

The determination of a-y is considered separately for the cases s even 
and odd. 

I. When s is an odd number. The sum of frequencies in vertical arrays 
may be represented, beginning at the left, by 

K, KS, Ksd, K3C3, KsCi,' • • •, K, • • • . (Q) 

The plan of finding a-J^ is to sum the second moments of all vertical 
arrays about the mean s/2 of the whole vertical distribution. It is simply 
necessary to weight the squares of standard deviations of binomial distri- 
butions with numbers proportional to the numbers (Q). This gives 
when we weight with numbers 1, s, ^d, aCz, • • •, 1, 
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[KiT+»ii+(i-o'i+-^'ii+(i-y 



+ .Ca ] I + 



0-/ = 



(i-y 



+ • • • +8^(8- 



1)12 



s — 



'-am 



s _s 

2 ~ 2»+i •-''^(«— 1)'2, 



or 



<^« = vl - 



2«+i' 



-iC(«- 



l)/2' 



(i?) 







Number of headj m fint Throws of Pain 









1 


2 3 


4 


5 


6 


T 


To to la 


1 

o 
o 

1 


T 










35 


84 


112 


;* 


295 


6 








35 


2.10 


336 


224/ 




605 


5 








210 


525 


504/ 


112 




1351 


4- 






6+ 


525 


too/ 


336 






1645 


3 






336 


too/ 


525 


84 






l«45 


2 




112 


504/ 


525 


210 








1351 


1 




zzy 


336 


210 
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Fia.4. 

Correlation op Number of Heads in First and Second Throws op Fairs under 

Conditions of Case III por s = 7. 

I calculated the product moments in the numerator of 

e(x-|)(j/-|) 



r = 



naxffy 



by obtaining first moments of the frequencies of vertical arrays about 
the mean of the total distribution and then finding the product moments 
by multiplying by the appropriate values oi y — s/2. 
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But it is simpler to note from the construction of the correlation table 
as shown, f or s = 7 in Fig. 4 that the means of arrays lie on a straight line 
of slope 1 when coordinates are taken as stated above. 

Hence, 

r-^= 1, 

(Jx 

_ £x _ 1 



l)/2 



•\/2 — 2s-i»-iC(s-i 



It may be of interest to examine this result for a few small odd integers. 
Thus, when 

s = 3, r = i V6 = 0.82, 

s = 5, r = ^j M = 0.79, 

s = 7, r = -I V3 = 0.77, 

s = 9, r = 2It V442 = 0.76, 



II. When s is an even number. By the same general plan as when s is 
an odd number, we have 

KlT+»l^(^01+■■■+•'^■•"--'p+(^l+;y 



'V — 

S S 



— 2~ Os+l*— '^(s'2)— !• 

Hence, 

r = ^ = ^ (T) 



^2 — ^zi .-iC(s/2)-i. 



To illustrate for a few special cases, we make s = 2, 8, 12. 

For s = 2, r = -g- = 0.82. 

8^/442 
For s = 8, r = ^^^ = 0.76. 

For s = 12, r = ^ Vl817 = 0.75. 
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It follows at once from (*S) and (T) that when s 



r =^ = 0.707+ . 

We consider next the following case of throwing two dice with one 
die taken at random in common. 

Case rv. Two dice are thrown giving a sum x. One of them taken at 
random is left lying to be counted with the other thrown again. The second 
trial of a pair thus made gives the sum y. Then the regression is linear and 
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Fig. 5. 

COBBELATION OF SUMS IN FiBST AND SeCOND TbIALS OF PaIBS OF TbIALS TINDEE 

Conditions op Case IV. 

the correlation coefficient between x and y is \, when the frequencies are a set 
of a priori most probable frequencies. 

The table (Fig. 5) shows a set of a priori most probable frequencies 
with respect to sums obtained in first and second throws under the con- 
ditions of Case IV for having one die in common. 

It results at once by a simple calculation that 

^ ^ S(a; -x)(y- y) ^ ^^ 
nffxffy 
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Furthermore, the composition of the table makes it obvious that the 
regression is Unear. 

Thus, we find in Case IV an analogy to the result in Case I in that the 
correlation coefficient is simply the ratio of the one die in common to the 
two dice thrown. Moreover, the condition is the same in Cases I and IV 
in that the elements in common are taken at random from first events of 
pairs. 

We consider lastly the following case of pairs of throws of two dice 
with the die bearing the larger number in the first throw in common. 

Case V. A throw of two dice gives numbers x and y where a; < y. The 
die y is left lying to be counted with a second throw of the die x. The second 
throw of X gives z. Put into correspondence y + z with x + y. Then the 
regression is non-linear and the correlation coefficient of y + z and x + y is 

r = xl 1^^/1086 = 0.5462+ 

and the correlation ratio of y + z on x + y is 

■n = 0.5743+ 

when the frequencies are a set of a priori most probable frequencies. 
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Fia. 6. 

CORBELATION OP ToTALS THROWN IN FiRST AND SECOND THROWS WITH TwO DiCE — THE LaRQBR 

NtruBEB Thrown in a First Throw being Counted in Common. Case V. 
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The table (Fig. 6) shows a set of a priori most probable frequencies 
of totals in first and second throws of pairs of throws with two dice under 
the conditions stated in Case V. It is obvious from the location of means 
of vertical arrays in the table that the regression is far from linear. The 
coordinates of the means of arrays are 

(2, 4.5), (3, 5.5), (4, 6|), (5, 7), (6, 7.7), (7, 8.5), (8, 8.7), (9, 9), (10, 9|), 

(11, 9.5), (12, 9.5). 

Since the regression is non-linear, we have calculated the correlation 
ratio as well as the correlation coefficient, the former being, in general, 
the more appropriate fimction for the characterization of correlation in a 
case of non-linear regression given by a single-valued function. Case V 
is of special interest because of its bearing on the view that all very simple 
cases of correlation lead to linear regression. We have in this simple 
case of pure chance a very significant departure from linear regression. 

In conclusion, the results of this paper make clear the meaning of the 
correlation coefficient for certain urn schemata, and indicate that the 
elements of the theory of correlation may be developed from such urn 
schemata as we have devised. Such a development from a priori prob- 
abilities seems decidedly less empirical than existing developments. It 
may be urged against a development from a priori probabilities that it 
neglects fluctuations in random sampling. The answer to this criticism 
is that we may actually carry out the corresponding experiments with 
the urn schemata when we wish to include fluctuations in sampling. 

The Untversity of Iowa. 



